Type validation of interpolations #578

odelalleau · 2021-03-03T02:32:00Z

Validate and convert interpolation results to their intended type

This PR fixes several issues:

For nested resolvers (ex: ${f:${g:x}}), intermediate resolver
outputs (of g in this example) were wrapped in a ValueNode just to
be unwrapped immediately, which was wasteful. This commit pushes the
node wrapping to the very last step of the interpolation resolution.
There was no type checking to make sure that the result of an
interpolation had a type consistent with the node's type (when
specified). Now a check is made and the interpolation result may be
converted into the desired type (see Overriding configuration using an environment doesn't trigger type checks when merging into Structured config #488).
If a resolver interpolation returns a dict / list, it is now wrapped
into a DictConfig / ListConfig, instead of a ValueNode. This makes it
possible to generate configs from resolvers (see Allow resolvers to output a dict / list to generate a DictConfig / ListConfig #540). These configs
are read-only since they are being re-generated on each access.

Fixes #488
Fixes #540

omry

I know those two features are related, but this would have been easier to review as two pull requests:

validating and converting on interpolation access
converting dict and list to DictConfig and ListConfig when returned from custom resolvers.

omry · 2021-03-04T22:20:07Z

docs/source/structured_config.rst

+    omegaconf.errors.ValidationError: Value 'string' could not be converted to Integer
+        full_key: int_key
+        object_type=Interpolation
+    >>> cfg.str_key = 1234  # convert int to str (assignment)


This conversion has nothing to do with interpolation and we are thinking about deprecating it anyway.

Suggested change

>>> cfg.str_key = 1234 # convert int to str (assignment)

>>> cfg.str_key = "1234" # string value

Yeah so that was related to how the original text also mentioned assignment... all fixed in a80ae95

docs/source/structured_config.rst

docs/source/usage.rst

news/488.api_change

omegaconf/base.py

omegaconf/nodes.py

omegaconf/dictconfig.py

tests/test_interpolation.py

omegaconf/nodes.py

odelalleau · 2021-03-05T00:57:03Z

I know those two features are related, but this would have been easier to review as two pull requests:

validating and converting on interpolation access

converting dict and list to DictConfig and ListConfig when returned from custom resolvers.

I hear you, although to be clear, in that case what happened is that the code I was writing for (1) turned out to also address (2). It's not like one was written on top of the other and combined in a single commit.

omegaconf/dictconfig.py

odelalleau · 2021-03-05T03:49:20Z

@omry I think I have addressed all comments except #578 (comment) on which I'd like to have your input first.

Also, something I just realized, is that if we have an AnyNode with optional == False, then we will accept None as an interpolation result. The obvious "fix" is to remove the and not isinstance(value, AnyNode) condition gating the call to validate_and_convert(), however the problem is that currently AnyNode.validate_and_convert() actually rejects lists and dcitionaries. For instance on master this currently crashes:

cfg = OmegaConf.create({"x": "???"})
cfg.x = {1, 2, 3}

I see 4 options:

Keep the current code from this PR and pretend it's ok not to validate interpolation results for an AnyNode (even if it is None and the node is not supposed to be optional)
Only check for None vs. the is_optional flag (without calling AnyNode.validate_and_convert()
Validate interpolation results with AnyNode.validate_and_convert() and change it to accept lists and dicts
Validate interpolation results with AnyNode.validate_and_convert() and keep it as it is

My preference would be for (2) or (3) (option (4) seems really bad as it would break a bunch of stuff, just mentioning it for the sake of completeness)

omry · 2021-03-07T19:20:19Z

I am not addressing any of your other comments yet, but I wanted at a high level (and in response to one of your questions):

This is what I think:
validating a container is very complicated and adds a lot of challenges. For example, how does all this interact with caching? things gets pretty dicey if you cache a DictConfig node that is actually pointing to a dynamic datastructure (through interpolation for example).

Doing it properly would require a lot of thinking and I just thinking about it (before reading your comments) made me realize we might want to drop the feature of generating DictConfig/ListConfig from 2.1 and reconsider for 2.2.

We need to wrap things up for 2.1 and this one seems like a pretty deep rabbit hole.

I think we have two options:

try to support this generating dictconfig and listconfig without validation at all (and maybe even produce an error when a user tries that, telling them the node type must be Any for this to be allowed.
Remove the generation feature completely from 2.1.

Either way we can revisit for 2.2.
What do you think?

odelalleau · 2021-03-07T22:41:26Z

try to support this generating dictconfig and listconfig without validation at all (and maybe even produce an error when a user tries that, telling them the node type must be Any for this to be allowed.

Remove the generation feature completely from 2.1.

To be clear (there's quite a lot in in the PR commentss to it's a bit hard to follow, sorry), the new feature of "generating dictconfig and listconfig" works fine and actually has validation (because _node_wrap() takes care of it).

What doesn't get validated is direct node interpolation, e.g., foo: ${bar} where foo is an annotated DictConfig and bar is another node. This is nothing new (it's already like this), but it makes it a bit tricky to explain in the documentaton (if we consider that it is the intended behavior).

So my suggestion would be to keep it like this but open an issue regarding the validation of container nodes for direct node interpolation (which could be fixed after the 2.1 release).

I didn't understand your comment regarding caching though: which caching mechanism are you talking about?

omry · 2021-03-08T00:51:03Z

When I say "generating", I am referring to custom resolvers returning dict or list.
This is also what I mean by caching.

We are creating a new node on the fly, is that new node cached or will it be regenerated on every access to the custom resolver?
What happens if the user mutates the returned DictConfig with and without caching?
What happens when the user is accessing other nodes via interpolation in the generated DictConfig? does that change in the presence of caching?

As for validation when accessing another node via interpolation: Since the other node is already a valid DictConfig/ListConfig, I think we can get away with shallow validation (let me know if you disagree with that assertion).
Is the top level type compatible with the type we are accessing it through?

This breaks down roughly to:

foo: ${bar}
bar: ...

Is the type of foo compatible with the type of bar?

if foo is AnyNode, that should be yes no matter what's in bar (maybe with the exception of optional none-checks).
if foo is Dict[str, int], bar must be a DictConfig[str,int] (exactly).
if foo is User, bar must be a User.
if foo is Optional[User], bar must be a none of User.

If we decide to not validate we should explain it in the docs with a sentence and consider closing the gap in a later version.
To be honest this is pretty far down the rabbit hole and the problem will effect very few people, I am not sure it justifies a lot of work (not to mention that once we start to support Dict[str, Dict[int, int]] and such it gets even more complicated.

odelalleau · 2021-03-08T02:26:56Z

When I say "generating", I am referring to custom resolvers returning dict or list.

So am I :)

This is also what I mean by caching.

We are creating a new node on the fly, is that new node cached or will it be regenerated on every access to the custom resolver?

It will be regenerated on every access. Even if the resolver is cached, only the dict/list output is cached, the node creation (and associated validation) will still happen dynamically.

What happens if the user mutates the returned DictConfig with and without caching?

Regardless of caching, the modifications will be lost next time the node is accessed, because node_wrap() re-creates a new object every time (at least based on my quick tests to check it out).

What should be the expected behavior is definitely a good question. I don't think it can make sense to allow modifications of such dynamically-generated configs, because the key we are overriding might not even exist anymore next time we access it. What if we made them read-only?

What happens when the user is accessing other nodes via interpolation in the generated DictConfig? does that change in the presence of caching?

No, because the cache is only caching the generated dict/list. So the interpolations will still be resolved dynamically regardless of whether it's cached or not.

As for validation when accessing another node via interpolation: Since the other node is already a valid DictConfig/ListConfig, I think we can get away with shallow validation (let me know if you disagree with that assertion).

Not sure what you mean by "shallow validation".

Is the top level type compatible with the type we are accessing it through?

This breaks down roughly to:
foo: ${bar}
bar: ...
Is the type of foo compatible with the type of bar?
(...)
If we decide to not validate we should explain it in the docs with a sentence and consider closing the gap in a later version.

Ok, and I would suggest to decide not to validate in 2.1 (because as you said, "this is pretty far down the rabbit hole").
Just to be be sure it's clear, doing this (by "this" I mean foo: ${bar} with bar being e.g. a DictConfig) is already supported and has no validation. So there would be no change here (just a short note in the doc)

omry · 2021-03-09T19:17:54Z

It will be regenerated on every access. Even if the resolver is cached, only the dict/list output is cached, the node creation (and associated validation) will still happen dynamically.

Okay. This could be a perf hit but I think it's okay. hopefully people will not use it to return massive dicts.

What should be the expected behavior is definitely a good question. I don't think it can make sense to allow modifications of such dynamically-generated configs, because the key we are overriding might not even exist anymore next time we access it. What if we made them read-only?

Making then read-only seems like a good idea.
If someone insists on mutating them they can unset the flag but at least they will know that they are doing something weird.

Not sure what you mean by "shallow validation".

I mean we should not re-create the whole node, must validate the the type match at the top level.

@dataclass
class Config:
  user : User
  admin: User = II("${user}")

cfg : Config = OmegaConf.structured(Config)
cfg.admin.name = "root" # this is perfectly legal and should mutate user.

In the above case, we should not re-generate the node on access. we should return the user node.
in terms of validation, we already KNOW that the user node is typed as User, same as the admin node. there is no point in re-creating the node to do validation. A shallow validation is sufficient (the type matches perfectly).
We can assume that the user node is a valid User because it was already validated when we constructed it in the first place. (If it's not valid it's a bug somewhere else, or maybe even intentional if the user forced some extra field into it later).

odelalleau · 2021-03-09T19:40:42Z

Keep the current code from this PR and pretend it's ok not to validate interpolation results for an AnyNode (even if it is None and the node is not supposed to be optional)

Only check for None vs. the is_optional flag (without calling AnyNode.validate_and_convert()

Validate interpolation results with AnyNode.validate_and_convert() and change it to accept lists and dicts

Validate interpolation results with AnyNode.validate_and_convert() and keep it as it is

My preference would be for (2) or (3) (option (4) seems really bad as it would break a bunch of stuff, just mentioning it for the sake of completeness)

Just a note that I went with option (2) in 21685e3 (because it's the least disruptive). Open to considering other options though.

odelalleau · 2021-03-09T19:45:19Z

I mean we should not re-create the whole node, must validate the the type match at the top level.

Ok thanks for the clarification, yes I agree with you.
I tried your example to be sure, it still works fine (up to minor fixes):

@dataclass
class Config:
  user : User = User()
  admin: User = II("user")
    
cfg : Config = OmegaConf.structured(Config)
cfg.admin.name = "root" # this is perfectly legal and should mutate user.
assert cfg.user.name == "root"

odelalleau · 2021-03-09T20:33:33Z

Making then read-only seems like a good idea.

Done in e25937d

omry · 2021-03-09T21:10:18Z

I owe you a full pass to see all your changes, I didn't respond to most of your comments/questions yet.

omry

Reviewed the code, didn't get to the tests.
Please read all comments before fixing things (one of them is asking to split the PR).

news/488.api_change

news/540.api_change

omry · 2021-03-09T22:49:53Z

docs/source/usage.rst

@@ -472,6 +472,19 @@ simply use quotes to bypass character limitations in strings.
    'Hello, World'


+Custom resolvers can return lists or dictionaries, that are automatically converted into DictConfig and ListConfig:


I am starting to think that we need a dedicated page for interpolations (not as a part of this PR).
Interpoaltion docs are now about 300 lines.

We can keep the really minimal use cases in usage and mention more advanced use cases and point to the interpolations doc page.

I am starting to think that we need a dedicated page for interpolations (not as a part of this PR).

Added to #535 => resolving

Edit 2021-11-22: actually unresolving as otherwise linking to this discussion doesn't work

omry · 2021-03-09T22:56:38Z

omegaconf/base.py

-            )
+        # If the output is not a Node already (e.g., because it is the output of a
+        # custom resolver), then we will need to wrap it within a Node.
+        must_wrap = not isinstance(resolved, Node)


This function is now a kilometer long.
Refactor it to reduce the size?

Done in d09277d

omegaconf/errors.py

omegaconf/base.py

omegaconf/nodes.py

omry · 2021-03-10T02:50:00Z

I had a comment I can't find now for some reason:
It basically said that AnyNode with optional False does not make any sense and that we should remove the optional flag from AnyNode and assume it's always True.

# 1.py
from typing import *

x: Any = None # fine
y: Optional[Any] = None # also fine, but redundant.

$ mypy --strict 1.py 
Success: no issues found in 1 source file

Since AnyNode is modeling Any, I think it should not have an optional flag. This will eliminate some of the confiusion

odelalleau · 2021-03-10T03:12:34Z

Since AnyNode is modeling Any, I think it should not have an optional flag. This will eliminate some of the confiusion

Agreed, I included this change in #592

This commit fixes several issues: * For nested resolvers (ex: `${f:${g:x}}`), intermediate resolver outputs (of `g` in this example) were wrapped in a ValueNode just to be unwrapped immediately, which was wasteful. This commit pushes the node wrapping to the very last step of the interpolation resolution. * There was no type checking to make sure that the result of an interpolation had a type consistent with the node's type (when specified). Now a check is made and the interpolation result may be converted into the desired type (see omry#488). * If a resolver interpolation returns a dict / list, it is now wrapped into a DictConfig / ListConfig, instead of a ValueNode. This makes it possible to generate configs from resolvers (see omry#540) Fixes omry#488 Fixes omry#540

This seems safer unless there is a clear need to instantiate it directly.

Now all exceptions are InterpolationValidationError (or a subclass), and `throw_on_resolution_failure=False` silences them all.

odelalleau · 2021-03-10T22:38:05Z

I rebased on top of master and force-pushed, now that #592 has been merged.

I cleaned up a bit the commit history. I will reference the new meaningful commits in the corresponding comments.

omegaconf/base.py

omry · 2021-03-11T18:57:28Z

tests/test_interpolation.py

+            # This example specifically test the case where intermediate resolver results
+            # are not of the same type as the key.
+            User(name="Bond", age=SI("${cast:int,${drop_last:${drop_last:7xx}}}")),


Can this be tested with identity instead? drop_last seems overly complex for what this test is doing.

User(name="Bond", age=SI("${cast:int,${identity:'7'}")),

Maybe even nest two identity calls (not sure what nesting twice is adding to the test).

Side note, this would have looked somewhat nicer if we had () syntax support. Maybe for 2.2 :).

${cast:int, ${identity:'7'} vs ${cast(int, ${identity('7'))}

I would also say that for a single argument, : is better.
So:

${cast(int, ${identity:'7')}

Can this be tested with identity instead? drop_last seems overly complex for what this test is doing.

User(name="Bond", age=SI("${cast:int,${identity:'7'}")),

Maybe even nest two identity calls (not sure what nesting twice is adding to the test).

I wanted to test explicitly the situation where there is an intermediate resolver output (by intermediate, I mean not the final result of the interpolation) that would be invalid if we tried to cast it to the node's type. The motivation was that my first implementation was doing the cast immediately after a resolver was called (even an intermediate one), which was incorrect (so I wanted a test to catch that).

If I just use identity then the string remains "7" which is actually valid when cast to the type of the node (integer). In that regard my comment motivating the test is not clear enough => I changed it in a99f776

I could also just remove that test if you feel like the extra complexity isn't worth it.

I think a combination of casts can do it then :)

User(name="Bond", age=SI("${cast:int,${cast:str,${identity:7}}}")),

Side note, this would have looked somewhat nicer if we had () syntax support. Maybe for 2.2 :).

${cast:int, ${identity:'7'} vs ${cast(int, ${identity('7'))}

I would also say that for a single argument, : is better.
So:

${cast(int, ${identity:'7')}

I guess the version I like most is this:

${cast(int, identity('7'))}

As in, once we are in an interpolation we no longer need the ${}, we are already in function mode.

I think a combination of casts can do it then :)

User(name="Bond", age=SI("${cast:int,${cast:str,${identity:7}}}")),

If I'm not mistaken the resolver outputs here are all either 7 (the integer) or "7" (the string) => both can be cast to int so that wouln't address the specific issue I wanted to pinpoint.

Yeah, I guess I don't fully understand the error you are defending against.

This was an error with the approach from https://github.com/omry/omegaconf/pull/506/files

In this approach, the call to validate_and_convert() happened immediately after a resolver was called, so in a chain of resolvers all outputs were validated (and possibly converted) according to the node's type. But that's wrong: it's only the final result that needs to be checked.

tests/test_interpolation.py

ghost · 2021-03-11T20:10:34Z

Congratulations 🎉. DeepCode analyzed your code in 2.394 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Co-authored-by: Omry Yadan <[email protected]>

tests/conftest.py

tests/test_interpolation.py

omry · 2021-03-12T02:21:22Z

tests/test_interpolation.py

+    OmegaConf.register_new_resolver("list", lambda: cfg)
+    c = OmegaConf.create({"x": "${list:}", "y": -1})
+    assert isinstance(c.x, ListConfig)
+    assert c.x == expected


You can also add a quick check to confirm those are readonly (here and in dict).

Suggested change

assert c.x == expected

assert c.x == expected

assert x._get_node("x")._get_flag("readonly")

Done in 572e125
I ended up adding a utility function since it can't be a one-liner because of mypy

Co-authored-by: Omry Yadan <[email protected]>

omry

awesome.

omry · 2021-03-13T20:15:39Z

Can you take a last look at the description of the PR and ensure it's still correct after all the changes? I will use it as is in the commit message.

odelalleau · 2021-03-13T20:58:43Z

Can you take a last look at the description of the PR and ensure it's still correct after all the changes? I will use it as is in the commit message.

Yes it looks good. I just added the fact that resolver-generated configs are read-only.

odelalleau force-pushed the inter_validation branch from fc8b08a to 5ae4efa Compare March 3, 2021 15:36

omry reviewed Mar 4, 2021

View reviewed changes

Jasha10 reviewed Mar 5, 2021

View reviewed changes

omegaconf/nodes.py Outdated Show resolved Hide resolved

Jasha10 reviewed Mar 5, 2021

View reviewed changes

omegaconf/dictconfig.py Outdated Show resolved Hide resolved

odelalleau force-pushed the inter_validation branch from cd8655b to b556a82 Compare March 5, 2021 03:20

Repository owner deleted a comment from odelalleau Mar 7, 2021

omry reviewed Mar 9, 2021

View reviewed changes

This was referenced Mar 10, 2021

Improved handling of None values in node validation #592

Merged

Follow-up plan regarding changes to resolvers #535

Closed

odelalleau marked this pull request as draft March 10, 2021 05:16

odelalleau added 7 commits March 10, 2021 17:34

Make ValueNode an abstract class

9923827

This seems safer unless there is a clear need to instantiate it directly.

Check if conversion is needed based on type rather than object identity

1749581

Consistent exceptions when resolving interpolations

a38eb8a

Now all exceptions are InterpolationValidationError (or a subclass), and `throw_on_resolution_failure=False` silences them all.

Set wrapped nodes as read-only

69a32d2

Update news item according to suggestion

dca7e01

Add news item to describe the new feature from omry#540

36167ba

odelalleau added 3 commits March 10, 2021 17:34

InterpolationResolutionError also inherits from ValidationError

6159f3e

Add docstring for _resolve_interpolation_from_parse_tree()

3d350db

Refactor: split _resolve_interpolation_from_parse_tree()

d09277d

odelalleau force-pushed the inter_validation branch from e25937d to d09277d Compare March 10, 2021 22:36

odelalleau marked this pull request as ready for review March 10, 2021 22:37

odelalleau requested a review from omry March 10, 2021 22:45

omry requested changes Mar 11, 2021

View reviewed changes

More accurate comment

a99f776

odelalleau and others added 4 commits March 11, 2021 15:22

Update tests/test_interpolation.py

0d63bf9

Co-authored-by: Omry Yadan <[email protected]>

Use a fixture instead of duplicating the identity resolver

c2c1c03

Add comment to explain test

4af8eb5

Clearer test names

804845f

omry reviewed Mar 11, 2021

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

omry reviewed Mar 12, 2021

View reviewed changes

tests/test_interpolation.py Outdated Show resolved Hide resolved

tests/test_interpolation.py Outdated Show resolved Hide resolved

tests/test_interpolation.py Outdated Show resolved Hide resolved

odelalleau added 3 commits March 11, 2021 20:52

Move cast resolver to fixture in interpolation tests

96e55f3

Add comment to explain test

b9a3197

Check readonly flag directly

fefffc1

omry reviewed Mar 12, 2021

View reviewed changes

odelalleau and others added 4 commits March 11, 2021 21:34

Update tests/test_interpolation.py

5080322

Co-authored-by: Omry Yadan <[email protected]>

Check readonly flag for dict/list resolver outputs

572e125

Fix syntax error

ce92f60

Remove now redundant definition of cast()

57ee921

odelalleau requested a review from omry March 12, 2021 23:05

omry approved these changes Mar 13, 2021

View reviewed changes

omry merged commit 3bf2c59 into omry:master Mar 13, 2021

odelalleau mentioned this pull request Mar 24, 2021

Add new resolvers oc.dict.keys and oc.dict.values #643

Closed

	>>> cfg.str_key = 1234 # convert int to str (assignment)
	>>> cfg.str_key = "1234" # string value

		@@ -472,6 +472,19 @@ simply use quotes to bypass character limitations in strings.
		'Hello, World'


		Custom resolvers can return lists or dictionaries, that are automatically converted into DictConfig and ListConfig:

	assert c.x == expected
	assert c.x == expected
	assert x._get_node("x")._get_flag("readonly")

Type validation of interpolations #578

Type validation of interpolations #578

Conversation

odelalleau commented Mar 3, 2021 • edited Loading

omry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odelalleau commented Mar 5, 2021

odelalleau commented Mar 5, 2021

omry commented Mar 7, 2021

odelalleau commented Mar 7, 2021

omry commented Mar 8, 2021

odelalleau commented Mar 8, 2021

omry commented Mar 9, 2021

odelalleau commented Mar 9, 2021

odelalleau commented Mar 9, 2021 • edited Loading

odelalleau commented Mar 9, 2021

omry commented Mar 9, 2021

omry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odelalleau Mar 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omry commented Mar 10, 2021

odelalleau commented Mar 10, 2021

odelalleau commented Mar 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Mar 11, 2021 • edited by ghost Loading

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Choose a reason for hiding this comment

odelalleau Mar 12, 2021 • edited Loading

Choose a reason for hiding this comment

omry left a comment

Choose a reason for hiding this comment

omry commented Mar 13, 2021

odelalleau commented Mar 13, 2021

odelalleau commented Mar 3, 2021 •

edited

Loading

odelalleau commented Mar 9, 2021 •

edited

Loading

odelalleau Mar 10, 2021 •

edited

Loading

ghost commented Mar 11, 2021 •

edited by ghost

Loading

odelalleau Mar 12, 2021 •

edited

Loading